AITopics | Saint Michael

We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.

artificial intelligence, h-softmax, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.17615

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(9 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Promising Solution (0.66)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

ARMAX identification of low rank graphical models

Cao, Wenqi, Li, Aming

arXiv.org Artificial IntelligenceJan-16-2025

In large-scale systems, complex internal relationships are often present. Such interconnected systems can be effectively described by low rank stochastic processes. When identifying a predictive model of low rank processes from sampling data, the rank-deficient property of spectral densities is often obscured by the inevitable measurement noise in practice. However, existing low rank identification approaches often did not take noise into explicit consideration, leading to non-negligible inaccuracies even under weak noise. In this paper, we address the identification issue of low rank processes under measurement noise. We find that the noisy measurement model admits a sparse plus low rank structure in latent-variable graphical models. Specifically, we first decompose the problem into a maximum entropy covariance extension problem, and a low rank graphical estimation problem based on an autoregressive moving-average with exogenous input (ARMAX) model. To identify the ARMAX low rank graphical models, we propose an estimation approach based on maximum likelihood. The identifiability and consistency of this approach are proven under certain conditions. Simulation results confirm the reliable performance of the entire algorithm in both the parameter estimation and noisy data filtering.

estimation, graphical model, identification, (13 more...)

arXiv.org Artificial Intelligence

2501.09616

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > China > Beijing > Beijing (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge

LaBella, Dominic, Baid, Ujjwal, Khanna, Omaditya, McBurney-Lin, Shan, McLean, Ryan, Nedelec, Pierre, Rashid, Arif, Tahon, Nourel Hoda, Altes, Talissa, Bhalerao, Radhika, Dhemesh, Yaseen, Godfrey, Devon, Hilal, Fathi, Floyd, Scott, Janas, Anastasia, Kazerooni, Anahita Fathi, Kirkpatrick, John, Kent, Collin, Kofler, Florian, Leu, Kevin, Maleki, Nazanin, Menze, Bjoern, Pajot, Maxence, Reitman, Zachary J., Rudie, Jeffrey D., Saluja, Rachit, Velichko, Yury, Wang, Chunhao, Warman, Pranav, Adewole, Maruf, Albrecht, Jake, Anazodo, Udunna, Anwar, Syed Muhammad, Bergquist, Timothy, Chen, Sully Francis, Chung, Verena, Conte, Gian-Marco, Dako, Farouk, Eddy, James, Ezhov, Ivan, Khalili, Nastaran, Iglesias, Juan Eugenio, Jiang, Zhifan, Johanson, Elaine, Van Leemput, Koen, Li, Hongwei Bran, Linguraru, Marius George, Liu, Xinyang, Mahtabfar, Aria, Meier, Zeke, Moawad, Ahmed W., Mongan, John, Piraud, Marie, Shinohara, Russell Takeshi, Wiggins, Walter F., Abayazeed, Aly H., Akinola, Rachel, Jakab, András, Bilello, Michel, de Verdier, Maria Correia, Crivellaro, Priscila, Davatzikos, Christos, Farahani, Keyvan, Freymann, John, Hess, Christopher, Huang, Raymond, Lohmann, Philipp, Moassefi, Mana, Pease, Matthew W., Vollmuth, Phillipp, Sollmann, Nico, Diffley, David, Nandolia, Khanak K., Warren, Daniel I., Hussain, Ali, Fehringer, Pascal, Bronstein, Yulia, Deptula, Lisa, Stein, Evan G., Taherzadeh, Mahsa, de Oliveira, Eduardo Portela, Haughey, Aoife, Kontzialis, Marinos, Saba, Luca, Turner, Benjamin, Brüßeler, Melanie M. T., Ansari, Shehbaz, Gkampenis, Athanasios, Weiss, David Maximilian, Mansour, Aya, Shawali, Islam H., Yordanov, Nikolay, Stein, Joel M., Hourani, Roula, Moshebah, Mohammed Yahya, Abouelatta, Ahmed Magdy, Rizvi, Tanvir, Willms, Klara, Martin, Dann C., Okar, Abdullah, D'Anna, Gennaro, Taha, Ahmed, Sharifi, Yasaman, Faghani, Shahriar, Kite, Dominic, Pinho, Marco, Haider, Muhammad Ammar, Aristizabal, Alejandro, Karargyris, Alexandros, Kassem, Hasan, Pati, Sarthak, Sheller, Micah, Alonso-Basanta, Michelle, Villanueva-Meyer, Javier, Rauschecker, Andreas M., Nada, Ayman, Aboian, Mariam, Flanders, Adam E., Wiestler, Benedikt, Bakas, Spyridon, Calabrese, Evan

arXiv.org Artificial IntelligenceMay-15-2024

We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning automated segmentation models using image data from the largest multi-institutional systematically expert annotated multilabel multi-sequence meningioma MRI dataset to date, which included 1000 training set cases, 141 validation set cases, and 283 hidden test set cases. Each case included T2, T2/FLAIR, T1, and T1Gd brain MRI sequences with associated tumor compartment labels delineating enhancing tumor, non-enhancing tumor, and surrounding non-enhancing T2/FLAIR hyperintensity. Participant automated segmentation models were evaluated and ranked based on a scoring system evaluating lesion-wise metrics including dice similarity coefficient (DSC) and 95% Hausdorff Distance. The top ranked team had a lesion-wise median dice similarity coefficient (DSC) of 0.976, 0.976, and 0.964 for enhancing tumor, tumor core, and whole tumor, respectively and a corresponding average DSC of 0.899, 0.904, and 0.871, respectively. These results serve as state-of-the-art benchmarks for future pre-operative meningioma automated segmentation algorithms. Additionally, we found that 1286 of 1424 cases (90.3%) had at least 1 compartment voxel abutting the edge of the skull-stripped image edge, which requires further investigation into optimal pre-processing face anonymization steps.

meningioma, segmentation, university, (16 more...)

arXiv.org Artificial Intelligence

2405.09787

Country:

North America > United States > California > San Francisco County > San Francisco (0.29)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.15)
Europe > Switzerland > Zürich > Zürich (0.14)
(55 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Nuclear Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Statistical Mechanics and Artificial Neural Networks: Principles, Models, and Applications

Böttcher, Lucas, Wheeler, Gregory

arXiv.org Artificial IntelligenceApr-5-2024

The field of neuroscience and the development of artificial neural networks (ANNs) have mutually influenced each other, drawing from and contributing to many concepts initially developed in statistical mechanics. Notably, Hopfield networks and Boltzmann machines are versions of the Ising model, a model extensively studied in statistical mechanics for over a century. In the first part of this chapter, we provide an overview of the principles, models, and applications of ANNs, highlighting their connections to statistical mechanics and statistical learning theory. Artificial neural networks can be seen as high-dimensional mathematical functions, and understanding the geometric properties of their loss landscapes (i.e., the high-dimensional space on which one wishes to find extrema or saddles) can provide valuable insights into their optimization behavior, generalization abilities, and overall performance. Visualizing these functions can help us design better optimization methods and improve their generalization abilities. Thus, the second part of this chapter focuses on quantifying geometric properties and visualizing loss functions associated with deep ANNs.

curvature, neural network, projection, (16 more...)

arXiv.org Artificial Intelligence

2405.10957

Country:

North America > United States > New York > Richmond County > New York City (0.14)
North America > United States > New York > Queens County > New York City (0.14)
North America > United States > New York > New York County > New York City (0.14)
(31 more...)

Genre:

Overview (0.68)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Add feedback

Practice with Graph-based ANN Algorithms on Sparse Data: Chi-square Two-tower model, HNSW, Sign Cauchy Projections

Li, Ping, Zhao, Weijie, Wang, Chao, Xia, Qi, Wu, Alice, Peng, Lijun

arXiv.org Machine LearningJun-13-2023

Sparse data are common. The traditional ``handcrafted'' features are often sparse. Embedding vectors from trained models can also be very sparse, for example, embeddings trained via the ``ReLu'' activation function. In this paper, we report our exploration of efficient search in sparse data with graph-based ANN algorithms (e.g., HNSW, or SONG which is the GPU version of HNSW), which are popular in industrial practice, e.g., search and ads (advertising). We experiment with the proprietary ads targeting application, as well as benchmark public datasets. For ads targeting, we train embeddings with the standard ``cosine two-tower'' model and we also develop the ``chi-square two-tower'' model. Both models produce (highly) sparse embeddings when they are integrated with the ``ReLu'' activation function. In EBR (embedding-based retrieval) applications, after we the embeddings are trained, the next crucial task is the approximate near neighbor (ANN) search for serving. While there are many ANN algorithms we can choose from, in this study, we focus on the graph-based ANN algorithm (e.g., HNSW-type). Sparse embeddings should help improve the efficiency of EBR. One benefit is the reduced memory cost for the embeddings. The other obvious benefit is the reduced computational time for evaluating similarities, because, for graph-based ANN algorithms such as HNSW, computing similarities is often the dominating cost. In addition to the effort on leveraging data sparsity for storage and computation, we also integrate ``sign cauchy random projections'' (SignCRP) to hash vectors to bits, to further reduce the memory cost and speed up the ANN search. In NIPS'13, SignCRP was proposed to hash the chi-square similarity, which is a well-adopted nonlinear kernel in NLP and computer vision. Therefore, the chi-square two-tower model, SignCRP, and HNSW are now tightly integrated.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2306.07607

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Quebec > Montreal (0.04)
(22 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Calibrated Propensity Scores for Causal Effect Estimation

Deshpande, Shachi, Kuleshov, Volodymyr

arXiv.org Artificial IntelligenceJun-1-2023

Propensity scores are commonly used to balance observed covariates while estimating treatment effects. Estimates obtained through propensity score weighing can be biased when the propensity score model cannot learn the true treatment assignment mechanism. We argue that the probabilistic output of a learned propensity score model should be calibrated, i.e. a predictive treatment probability of 90% should correspond to 90% of individuals being assigned the treatment group. We propose simple recalibration techniques to ensure this property. We investigate the theoretical properties of a calibrated propensity score model and its role in unbiased treatment effect estimation. We demonstrate improved causal effect estimation with calibrated propensity scores in several tasks including high-dimensional genome-wide association studies, where we also show reduced computational requirements when calibration is applied to simpler propensity score models.

machine learning, natural language, propensity score model, (18 more...)

arXiv.org Artificial Intelligence

2306.00382

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > Greenland (0.04)
North America > Barbados > Saint Michael > Bridgetown (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

Adversarial Calibrated Regression for Online Decision Making

Kuleshov, Volodymyr, Deshpande, Shachi

arXiv.org Artificial IntelligenceJun-1-2023

Accurately estimating uncertainty is an essential component of decision-making and forecasting in machine learning. However, existing uncertainty estimation methods may fail when data no longer follows the distribution seen during training. Here, we introduce online uncertainty estimation algorithms that are guaranteed to be reliable on arbitrary streams of data points, including data chosen by an adversary. Specifically, our algorithms perform post-hoc recalibration of a black-box regression model and produce outputs that are provably calibrated -- i.e., an 80% confidence interval will contain the true outcome 80% of the time -- and that have low regret relative to the learning objective of the base model. We apply our algorithms in the context of Bayesian optimization, an online model-based decision-making task in which the data distribution shifts over time, and observe accelerated convergence to improved optima. Our results suggest that robust uncertainty quantification has the potential to improve online decision-making.

artificial intelligence, calibration, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2302.12196

Country:

North America > United States (0.04)
North America > Barbados > Saint Michael > Bridgetown (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
(2 more...)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Tailoring Language Generation Models under Total Variation Distance

Ji, Haozhe, Ke, Pei, Hu, Zhipeng, Zhang, Rongsheng, Huang, Minlie

arXiv.org Artificial IntelligenceFeb-26-2023

The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method. From a distributional view, MLE in fact minimizes the Kullback-Leibler divergence (KLD) between the distribution of the real data and that of the model. However, this approach forces the model to distribute non-zero (sometimes large) probability mass to all training samples regardless of their quality. Moreover, in the attempt to cover the low-probability regions in the data distribution, the model systematically overestimates the probability of corrupted text sequences, which we conjecture is one of the main reasons for text degeneration during autoregressive decoding. To remedy this problem, we leverage the total variation distance (TVD) with its robustness to outliers, and develop practical bounds to apply it to language generation. Then, we introduce the TaiLr objective that balances the tradeoff of estimating TVD. Intuitively, TaiLr downweights real data samples that have low model probabilities with tunable penalization intensity. Experimental results show that our method alleviates the overestimation of degenerated sequences without sacrificing diversity and improves generation quality on a wide range of text generation tasks.

computational linguistic, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2302.13344

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(20 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Industry: Transportation > Air (0.46)

Add feedback